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We describe how to calculate the sizes of all giant connected components of a directed graph, 
including the strongly connected one. Just to the class of directed networks, in particular, belongs 
the World Wide Web. The results are obtained for graphs with statistically uncorrelated vertices 
and an arbitrary joint in,out-degree distribution P(ki,ko). We show that if P(ki,ko) does not 
factorize, the relative size of the giant strongly connected component deviates from the product of 
the relative sizes of the giant in- and out-components. The calculations of the relative sizes of all the 
giant components are demonstrated using the simplest examples. We explain that the giant strongly 
connected component may be less resilient to random damage than the giant weakly connected one. 
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The giant components of a network are components 
which relative sizes are finite (nonzero) in the large net- 
work limit. The knowledge of these sizes provides the 
basic information about the global topology of a net- 
work. The understanding of the topological structure 
of networks and its change under external action is the 
central problem of the statistical physics of random net- 
works Actually, this is the natural generalization 
of the general percolation theory. 

The most interesting networks in Nature, including the 
World Wide Web, are directed graphs, i.e., their vertices 
are connected by directed edges [p|J9 -13|. In the general 



case, the structure of the directed graph looks as it is 
shown in Fig. |l| (all the notions are introduced and ex- 
plained in the figure caption). In particular, the World 
Wide Web has such a structure 

In Refs. jMnl, the previous strong results of mathe- 
maticians p4| , ]l5[ were developed, and it was proposed 
the general theory of percolation phenomena in networks 
with arbitrary degree distributions and statistically un- 
correlated (randomly connected) vertices. Of course, the 
last assumption is not true for most of growing nets in 
Nature. Nevertheless, the direct conclusions of such an 
approach proved to explain the behavior of real networks 

In paper M , it was shown how to find the relative sizes 
of the following giant components of directed graphs with 
statistically uncorrelated vertices: (i) the giant weakly 
connected component, W; (ii) the giant in- component, I; 
and (iii) the giant out- component, O. We emphasize that, 
for brevity and consistency, we use the definitions of_the 
giant in- and out-components other than in Refs. 
(see the caption of Fig. ||). 

Here we demonstrate how to calculate the relative size 
S of, perhaps, the most important part of the directed 
graph, of its giant strongly connected component (iv). (In 
the GSSC, every pair of vertices is connected in both di- 
rections, i.e., from one of the vertices, one can approach 
the other by moving either along or against the edge di- 
rections.) This allows us to completely describe the total 



structure of directed graphs with arbitrary degree distri- 
butions and statistically uncorrelated vertices. For the 
demonstration, we use the networks with the simplest 
degree distributions providing non-trivial results. 

We have to briefly remind a very usefuU approach of 
Ref. Q. The Z-transforms (or generating functions) are 
used. For the undirected graph, <I>(.-e) = J2k P{k)x'^ , and, 
for the directed one, $(a;,y) = X^fc fc P{ki,ko)x^^y^° 
0. Here, P{k) = P^^'^k) = 'Zk.PCh,k- h) is the 
degree distribution {k — ki -\- kg is the total number of 
connections of a node) and P{ki,ko) is the joint distri- 
bution of in- and out-degrees. When all the connections 
are inside the network, the average in- and out-degrees 
are equal: dx'^{x,l) \^^^ = dy^{l,y) = z^''\ There- 
fore the average degree is z = 2z^'^\ If one ignores the 
directedness of edges, the degree distribution of the di- 
rected network, in the Z-representation, takes the form 
x,x). In this case, the distribution of 
the number of connections minus one of any of the 
end vertices of a randomly chosen edge corresponds to 
^^^\x) = $('^)'(x)/z. 

The giant weakly connected component exists if 
(1) > 1, that corresponds to the well known cri- 
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terium of MoUoy and Reed 

^k{k-2)P{k) > 0. 



(1) 



The size of the GWCC, W, can be easily obtained from 
the relations |^Jl5[] 



W ^1- (ic) 



(2) 



FromEq. one sees that the existence of the GWCC 
crucially depends on the size of the fraction of dead ends 
in the network. Indeed, P{1) is the only term in Eq. 
(|l|) that prevents the GWCC. In Fig. |,a, the evolu- 
tion of the giant connected component of the undirected 
graph induced by the change of some control parame- 
ter is schematically shown. From Eq. (nh, it is also 
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clear that the divergency of the second moment of the 
degree distribution makes the GWCC extremely stable 
I fiql . If the exponent 7 of the power-law degree distribu- 
tion P{k) ~ is smaller than 3 or equal to it, one has 
to remove at random almost all the vertices or edges of 
the network to eliminate the GWCC 

In a similar way, it is easy to study the GIN and 
GOUT componets of the directed network One in- 
troduces the Z-transform of the out-degree distribution of 
the vertex approachable by following a randomly chosen 
edge when one moves along the edge direction, ^'f\y) = 
d.^x,y) Also, ^'i\x) = dy^{x,y) l^^Jz^^'^ 

corresponds to the in-degree distribution of the vertex 
which one can approach moving against the edge direc- 
tion. The GIN and GOUT are present if ^["^'(1) = 



--i-.v-- 



> 1, that is 



^ ^ {'^kikp ki ko^P^ki^ko) — 



2 ^ h{ko ~ i)P{h, ko) = 2j2koih - i)P{h, fco) > . 

(3) 

In this case, there exist the non-trivial solutions of the 
equations 



Vc 



(4) 



They have the following meanings. Xc < 1 is the proba- 
bility that the connected component, obtained by moving 
against the edge directions starting from a randomly cho- 
sen edge, is finite, < 1 is the probability that the con- 
nected component, obtained by moving along the edge 
directions starting from a randomly chosen edge, is finite. 
Then P{ki,ko)x'^^ and P{ki,ko)yc° are the probabilities 
that a vertex with ki incoming and kg outgoing edges has 
finite in- and out-components, respectively. The in- and 
out-components of a vertex are sets of vertices which are 
approachable from it moving against and along edges, re- 
spectively, plus the vertex itself. Summation of these ex- 
pressions over {ki,ko) yields the total probabilities that 
the in- and out-components of a randomly chosen ver- 
tex are finite, respectively. Therefore, they are equal 
to $(a;c,l) and 4>(l,yc), respectively. Thus, the relative 
sizes of the GIN and GOUT are 



(5) 



Here we show that from Eq. (^, it is possible to 
find not only / and O but also the relative size S of 
the GSCC using the considerations similar to Ref. 
Suppose that a vertex has ki incoming and fco outgo- 
ing edges. They are assumed to be statistically inde- 
pendent. Then the probability that all the incoming 
edges come from finite in-components is x'^' . The prob- 
ability that this vertex has the infinite in-component is 



equal to 1 — x^' , that is, at least one of the ki incom- 
ing edges has to come from the GIN. Similarly, 1 — y'^° 
is the probability that the vertex has the infinite out- 
component. The vertex belongs to the GSCC if its in- 
and out-components are both infinite; the corresponding 



probability is equal to (1 



)(1 Then the total 



probability that a vertex belongs to the GSCC is equal 
to J2ki k„ ko){l - x'^'){l ~ y^°). Finally, the relative 
size of the GSCC takes the form 



s^J2 ^(fc^,M(i-^cO(i-2/c°) = 

1 - 1) - $(1, ?ic) + '^{Xc, yc) ■ 



(6) 



Therefore, ^{xc, yc) is equal to the probability that both 
the in- and out-components of a vertex are finite. One 
can write $(xc, j/c) = l — W + T. Knowing W, S, I, and 
O, it is easy to obtain the relative size of tendrils, 



T = W + S - I - O . 



(7) 



Eqs. (H), allow us to obtain all the giant com- 

ponents of the directed networks with arbitrary joint 
in, out-degree distributions and statistically uncorrelated 
vertices. It is usefuU to rewrite the main Eq. in the 
form 



S = IO + yc) - 1)$(1, yc) . 



(8) 



If the the joint distribution of in- and out-degrees fac- 
torizes, P{h,ko) = P'^'^{ki)P'^°\ko), Eq. (|) takes the 
simple form S = lO. Otherwise, such factorization of S 
is impossible. At the threshold, Xc = yc — ^, and /, O, 
and S simultaneously approach zero. 

We have no intention to calculate the sizes of the gi- 
ant components for real networks, for instance, for the 
WWW, for the following reasons. There are some cor- 
relations between their vertices, and their joint degree 
distributions are unknown yet (nevertheless, see the at- 
tempt of the calculation of / and O for the WWW in 
Ref. Instead of this, for demonstration, we consider 
two simplest non-trivial nets. In the first of them, the 
joint in, out-degree distribution factorizes: P{ki,ko) = 
Wk^fi + + (1 - 2p)(5fc,,3]x Wk^fl + 5k„,i) + (1 - 
2p)5k„,z\- The results, the dependence of the sizes of the 
giant components on p are shown in Fig. H and, schemat- 
ically, in Fig. gb. The curves I{p) = 0{p) approach the 
threshold linearly, and S{p) - quadratically but the range 
of the quadratic dependence is narrow. Over a wide range 
of p, S{p) « 1/2 — p (also see the next case. Fig. Q). 

In real growing nets, the joint in, out-degree distri- 
butions do not factorize just because of their growth 
|2|,[8|jl9). Therefore, for comparison, we calculate the 
sizes of the giant components for the network with the 
distribution P{ki,ko) = p((5fci,o'5fc„,i + 4i,i4„,o) + (1 - 
2p)Ski,3Sk„,3- This means that large in- and out-degrees 
correlate as well as small in- and out-degrees. The results 
are plotted in Fig. ^ (also see the schematic plot ||,b) . 
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One sees that, in this case, the size of the GSCC notice- 
ably differs from the product lO. We should note that 
a similar deviation is present in the WWW. From the 
data of Ref. |] for the WWW, / « 0.490, O « 0.489, 
so lO ~ 0.240, that is less than the measured value 
S « 0.277 but is not far from it. 



Figures ^[-^ demonstrate that, in the wide enough 
range of parameters, the following situation may be re- 
alized. The directed graph may have the GWCC and 
have no the GSCC. Only the stability of the GWCC to 
random damage was discussed yet Nevertheless, just 
the stability of the GSCC is the most important, e.g., for 
the WWW. Is it possible that the GSCC are less resilient 
to failures than the GWCC? Let us briefly discuss this 
problem. 

One can see from Eq. , that the GSCC is extremely 
resilient if the average (kiko) diverges. Let us consider 
two limiting situations. In the first one, the joint in, out- 
degree distribution factorizes, so (kiko) = {ki)"^ . In this 
case, if the distributions are of a power-law form then, for 
the robustness of the GSCC, the corresponding exponent 
7i or 7o should be smaller than 2 or equal it. This is a 
very strong requirement. Here, the smallest exponent of 
7i and 7o is also equal to the exponent 7 of the degree 
distribution. For the resilience of the GWCC to random 
damage, it should be 7 < 3 that ensures the diver- 
gence of (fc^). Therefore, for such distributions, when 
2 < min(7i,7o) = 7 < 3, random damage can destroy 
the GSCC but can not ehminate the GWCC. 

In the other limiting case, P{ki,ko) = P{ki)5ki,kai 
the correlations between in- and out-degrees are very 
strong. This form more resembles the joint degree distri- 
butions of the real growing networks. In such an event, 
{kiko) — (kf), and the conditions for the resilience of the 
GSCC and GWCC, 7 < 3, coincide. One should note 
that the real distributions are between the considered 
limiting cases. 

In summary, we have shown how to obtain the size of 
the giant strongly connected component of the directed 
network with the arbitrary degree distribution and sta- 
tistically uncorelated vertices. This allows us to find all 
the giant components of such a graph and to describe 
its basic structure. Using the simplest examples and the 
general considerations we have demonstrated that the 
correlations between in- and out-degrees subsequently 
influence the global topology of the network. 
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control parameter 



FIG. 1. General structure of a directed network in the 
situation when the giant strongly connected component is 
present. Also the structure of the WWW (compare with Fig. 
9 of Ref. H). 

If one ignores the directedness of edges, the network consists 
of the giant weakly connected component (GWCC) - actu- 
ally, the usual percolative cluster - and disconned components 
(DC). 

Accounting for the directedness of edges, the GWCC contains 
the following components: 

(a) the giant strongly connected component (GSCC), that is 
the set of vertices reachable from its every vertex by a di- 
rected path; 

(b) the giant out-component (GOUT), the set of vertices ap- 
proachable from the GSCC by a directed path; 

(c) the giant in-component (GIN), contains all vertices from 
which the GSCC is approachable; 

(d) the tendrils, the rest of the GWCC, i.e., the vertices 
which have no access to the GSCC and are not reachable 
from it. In particular, it indeed includes something like "ten- 
drils" [^] going out of GIN or coming in the GOUT but also 
there are "tubes" going from the GIN to GOUT without pas- 
sage through GSCC and numerous clusters which are only 
"weakly" connected. 

Note that the definitions of the GIN and GOUT in the present 
paper differ from the definitions of Refs. j^,^ . Here the GSCC 
is included into both the GIN and GOUT, so the GSCC is the 
interception of the GIN and GOUT. We have to introduce the 
new definitions for the sake of brevity and logical presentation 
(see the calculations in the text). 
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FIG. 2. Schematic plots of the variations of all the giant 
components vs some control parameter for the undirected net- 
work (a) and for the directed one (fo). In the undirected graph, 
the meanings of the giant connected component (GCC), i.e., 
its percolative cluster, and the GWCC coincide. 
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FIG. 3. Relative sizes of the GWCC (W), GSCC (S), GIN 
(J), GOUT (O), and TENDRILS (T) vs the parameter p for 
the direeted graph with the faetorizable joint in, out-degree 
distribution P{ki,ko) = \p{5kifi + Ski,i) + (1 - 2p)4i,3]x 
Wkofi + Ska,i) + (1 - 2p)4„,3]- In this case, S = IO,I = 0. 
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FIG. 4. Relative sizes of the GWCC (W), GSCC (S), GIN 
(/), GOUT (O), and TENDRILS (T) vs the parameter p for 
the directed graph with the joint in,out-dcgrcc distribution 
P{ki,ko) = p(4i,o4„,i + 4i,i4„,o) + (1 - 2p)4i,34„,3 that 
does not factorize. This form of the distribution means that 
if a node is of large in-degree, then its out-degree is also large. 
Also, if the degree of a node is small, the out-degree is small 
too. The dashed curve shows the product lO (compare with 
the curve for S). In the particular case that we consider here, 

i = o. 
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